Embedding similarity evaluation : English-Chinese

Level 0 : Basic Overview Level 1 : Retrieval Level 2 : Topic‑Level Level 3 : Error‑Type
Model Model Info Dist Chart Rand Chart Basic Stats CMC Curve Retrieval Stats UMAP (Topics & Lang1) UMAP (Topics & Lang2) UMAP (Sent‑Len) Topic Avg Cosine Error % Bar Chart
all_mpnet_base_v2
Statistic Value
arch mpnet
hidden_size 768
layers 12
vocab_size 30527
Statistic Value
mean_true 0.287149
std_true 0.224910
mean_random 0.117413
std_random 0.085043
snr 1.995884
ks_p_value 0.000000
Statistic Value
recall@k 0.100900
precision@k 0.020180
mean reciprocal rank 0.236851
all_MiniLM_L6_v2
Statistic Value
arch bert
hidden_size 384
layers 6
vocab_size 30522
Statistic Value
mean_true 0.252059
std_true 0.232693
mean_random 0.082790
std_random 0.083890
snr 2.017742
ks_p_value 0.000000
Statistic Value
recall@k 0.105700
precision@k 0.021140
mean reciprocal rank 0.241118
all_roberta_large_v1
Statistic Value
arch roberta
hidden_size 1024
layers 24
vocab_size 50265
Statistic Value
mean_true 0.206714
std_true 0.236873
mean_random 0.066457
std_random 0.072936
snr 1.923015
ks_p_value 0.000000
Statistic Value
recall@k 0.085100
precision@k 0.017020
mean reciprocal rank 0.229314
paraphrase_mpnet_base_v2
Statistic Value
arch mpnet
hidden_size 768
layers 12
vocab_size 30527
Statistic Value
mean_true 0.207840
std_true 0.249521
mean_random 0.078083
std_random 0.074334
snr 1.745592
ks_p_value 0.000000
Statistic Value
recall@k 0.099700
precision@k 0.019940
mean reciprocal rank 0.240206
paraphrase_MiniLM_L6_v2
Statistic Value
arch bert
hidden_size 384
layers 6
vocab_size 30522
Statistic Value
mean_true 0.193350
std_true 0.246487
mean_random 0.090438
std_random 0.088177
snr 1.167098
ks_p_value 0.000000
Statistic Value
recall@k 0.087500
precision@k 0.017500
mean reciprocal rank 0.232547
bert_base_nli_mean_tokens
Statistic Value
arch bert
hidden_size 768
layers 12
vocab_size 30522
Statistic Value
mean_true 0.405271
std_true 0.217462
mean_random 0.283704
std_random 0.106589
snr 1.140522
ks_p_value 0.000000
Statistic Value
recall@k 0.097650
precision@k 0.019530
mean reciprocal rank 0.236385
LaBSE
Statistic Value
arch bert
hidden_size 768
layers 12
vocab_size 501153
Statistic Value
mean_true 0.862219
std_true 0.087595
mean_random 0.147383
std_random 0.095485
snr 7.486350
ks_p_value 0.000000
Statistic Value
recall@k 0.986000
precision@k 0.197200
mean reciprocal rank 0.976524
distiluse_base_multilingual_cased_v2
Statistic Value
arch distilbert
hidden_size 768
layers 6
vocab_size 119547
Statistic Value
mean_true 0.842284
std_true 0.109548
mean_random 0.023780
std_random 0.082245
snr 9.952019
ks_p_value 0.000000
Statistic Value
recall@k 0.975900
precision@k 0.195180
mean reciprocal rank 0.964685
paraphrase_multilingual_MiniLM_L12_v2
Statistic Value
arch bert
hidden_size 384
layers 12
vocab_size 250037
Statistic Value
mean_true 0.845384
std_true 0.109365
mean_random 0.175116
std_random 0.133645
snr 5.015300
ks_p_value 0.000000
Statistic Value
recall@k 0.957850
precision@k 0.191570
mean reciprocal rank 0.938954
paraphrase_multilingual_mpnet_base_v2
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.872155
std_true 0.101651
mean_random 0.230564
std_random 0.128888
snr 4.977911
ks_p_value 0.000000
Statistic Value
recall@k 0.960450
precision@k 0.192090
mean reciprocal rank 0.944427
paraphrase_xlm_r_multilingual_v1
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.835749
std_true 0.109826
mean_random 0.202857
std_random 0.108693
snr 5.822727
ks_p_value 0.000000
Statistic Value
recall@k 0.965800
precision@k 0.193160
mean reciprocal rank 0.952686
xlm_r_distilroberta_base_paraphrase_v1
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.835749
std_true 0.109826
mean_random 0.202857
std_random 0.108693
snr 5.822727
ks_p_value 0.000000
Statistic Value
recall@k 0.965800
precision@k 0.193160
mean reciprocal rank 0.952686
stsb_xlm_r_multilingual
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.885081
std_true 0.100873
mean_random 0.184520
std_random 0.152950
snr 4.580335
ks_p_value 0.000000
Statistic Value
recall@k 0.948050
precision@k 0.189610
mean reciprocal rank 0.932879
xlm_r_bert_base_nli_stsb_mean_tokens
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.885081
std_true 0.100873
mean_random 0.184520
std_random 0.152950
snr 4.580335
ks_p_value 0.000000
Statistic Value
recall@k 0.948050
precision@k 0.189610
mean reciprocal rank 0.932879
xlm_r_100langs_bert_base_nli_stsb_mean_tokens
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.885081
std_true 0.100873
mean_random 0.184520
std_random 0.152950
snr 4.580335
ks_p_value 0.000000
Statistic Value
recall@k 0.948050
precision@k 0.189610
mean reciprocal rank 0.932879
xlm_r_100langs_bert_base_nli_mean_tokens
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.932605
std_true 0.065322
mean_random 0.362366
std_random 0.161215
snr 3.537126
ks_p_value 0.000000
Statistic Value
recall@k 0.939350
precision@k 0.187870
mean reciprocal rank 0.922966
distilbert_multilingual_nli_stsb_quora_ranking
Statistic Value
arch distilbert
hidden_size 768
layers 6
vocab_size 119547
Statistic Value
mean_true 0.956673
std_true 0.036464
mean_random 0.776440
std_random 0.068607
snr 2.627024
ks_p_value 0.000000
Statistic Value
recall@k 0.898300
precision@k 0.179660
mean reciprocal rank 0.864336
quora_distilbert_multilingual
Statistic Value
arch distilbert
hidden_size 768
layers 6
vocab_size 119547
Statistic Value
mean_true 0.956673
std_true 0.036464
mean_random 0.776440
std_random 0.068607
snr 2.627024
ks_p_value 0.000000
Statistic Value
recall@k 0.898300
precision@k 0.179660
mean reciprocal rank 0.864336
xlm_r_large_en_ko_nli_ststb
Statistic Value
arch xlm-roberta
hidden_size 1024
layers 24
vocab_size 250002
Statistic Value
mean_true 0.892696
std_true 0.097706
mean_random 0.188783
std_random 0.151826
snr 4.636321
ks_p_value 0.000000
Statistic Value
recall@k 0.951150
precision@k 0.190230
mean reciprocal rank 0.934592
xlm_r_base_en_ko_nli_ststb
Statistic Value
arch xlm-roberta
hidden_size 768
layers 12
vocab_size 250002
Statistic Value
mean_true 0.864184
std_true 0.103033
mean_random 0.222771
std_random 0.169783
snr 3.777835
ks_p_value 0.000000
Statistic Value
recall@k 0.924050
precision@k 0.184810
mean reciprocal rank 0.899723
clip_ViT_B_32_multilingual_v1
Statistic Value
arch distilbert
hidden_size 768
layers 6
vocab_size 119547
Statistic Value
mean_true 0.945092
std_true 0.039748
mean_random 0.830551
std_random 0.065523
snr 1.748118
ks_p_value 0.000000
Statistic Value
recall@k 0.636650
precision@k 0.127330
mean reciprocal rank 0.639024